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Abstract: The objective of this paper is to develop statistical methodology for planning 
and evaluating three-armed non-inferiority trials for general retention of effect hypotheses, 
where the endpoint of interest may follow any (regular) parametric distribution family. This 
generalizes and unifies specific results for binary, normally and exponentially distributed 
endpoints. We propose a Wald-type test procedure for the retention of effect hypothesis 
(RET) , which assures that the test treatment maintains at least a proportion A of reference 
treatment effect compared to placebo. At this, we distinguish the cases where the variance 
of the test statistic is estimated unrestrictedly and restrictedly to the null hypothesis, to 
improve accuracy of the nominal level. We present a general valid sample size allocation 
rule to achieve optimal power and sample size formulas, which significantly improve existing 
ones. Moreover, we propose a general applicable rule of thumb for sample allocation and 
give conditions where this rule is theoretically justified. The presented methodologies are 
discussed in detail for binary and for Poisson distributed endpoints by means of two clinical 
trials in the treatment of depression and in the treatment of epilepsy, respectively. i?-softwarc 
for implementation of the proposed tests and for sample size planning accompanies this 
paper. 
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1. Introduction 

The aim of a non-inferiority trial is to demonstrate that the efficacy of a test treatment relative to 
a reference one does not fall below a clinically relevant value. For selective fundamental references 
we refer to Jones et al. (1996), Rohmel (1998), D'Agostino (2003) and Munk & Trampisch (2005). 
In this work we focus on the direct comparison of a test and reference group. To this end, the 
inclusion of a concurrent placebo group is recommended if there arc no ethical concerns, i.e. the 
patients are not harmed by deferral of therapy and are fully informed about alternative (see e.g. 
Temple & Ellcnberg (2000) and Hypericum Depression Trial Study Group (2004)), to ensure for 
assay sensitivity of the trial. Such a design, including a (T)est, (R)efercnce and (P)laccbo group, 
has been coined by Koch & Rohmel (2004) as gold standard design. 

Retention of effect hypothesis: To demonstrate non-inferiority in the gold standard design we 
consider the retention of effect type hypothesis 

H : 9 T — dp < A ■ (Op — 6 P ) 
vs. (1.1) 
Hi : T — 6p > A • (0 R — P ), 

where 0k G 8 C R, k = T,R,P, is the parameter of interest, representing the efficacy of a 
treatment, and A € [0, oo) a fixed constant expressing the amount of the active control effect 
relative to placebo, which should be retained. For a discussion of various issues encountered with 
the choice of A we refer to Lange & Freitag (2005) and the references given there, who provide 
a systematic review of 332 published non-inferiority studies. Examples for 0^ are (a) 0k = i^k the 
success probability of a binary endpoint representing for example if the patient achieves remission 
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(Kieser & Friede, 2007), (b) 9k = \k the expectation of an exponentially distributed endpoint 
representing for example the time until healing or remission (Mielke et al., 2008), (c) 9k = \Xk the 
expectation of a normally distributed endpoint representing for example the FCV (forced vital 
capacity) in a trial on mildly asthmatic patients (Pigeot et al., 2003). Note, that in this set up we 
presume that large values of 9^ are associated with higher efficacy of the treatment. Compared to 
absolute hypotheses, e.g. H : 0t < Or — A with A > 0, the advantage of the hypothesis (1.1) 
is that it is invariant with respect to rescaling or shifts of the parameters 9k, i.e. the margin A 
must not be readjusted to the changes of parametrization. Thus, the margin A is standardized 
in that sense and therewith it could easily be compared for different hypothesis and applications, 
respectively. Further, it has an intuitive and clear interpretation. Rejecting H$ implies to claim 
that the test treatment achieves at least A • 100% of the active control effect, at which both are 
compared relatively to placebo. Rewriting the alternative in (1.1) as 

H l : 9 T > A • 6 R + (1 - A) • 6 P 

illustrates that in this case the test treatment effect is greater than a convex combination of the 
reference and the placebo effect if < A < 1. This includes two extremal cases: For A = 1 we 
obtain superiority of the test treatment to the reference one (at least A = 100% of the reference 
effect is retained) and for A = superiority of the test treatment to placebo. 

As mentioned above for binary endpoints a typical choice is 9k = Tth, the success probability. 
However, in practical application also transformations of the success probability are of interest, e.g. 
log(7Tfc), 7Tfc/ (1 — 7Tj.), log(7Tj./(l — 7T^)) or just — TTfc in case of a mortality rate. For a comprehensive 
discussion of several hypotheses for binary endpoints see Rohmel & Mansmann (1999). In order 
to formalize this we modify the hypothesis (1.1) to 

H 0M e k ) ■ h(9 T ) - h(9 P ) < A • (h(9 R ) - h{9 P )) 
vs. (1.2) 
H lMBk) : h{9 T ) - h(9 P ) > A • (h(9 R ) - h{9 P )) 

where ^eGC R d , k = T,R,P, determines the distribution of our endpoints of interest. Here, 
h(-) is a differentiable, strictly monotone, real- valued function on the parameter space measuring 
the efficiency of a treatment whereas larger values of h(-) correspond to higher efficiency. In the 
following, we will omit the alternatives and only state the null hypotheses. 

Aim and scope: The aim of this work is to provide a general testing methodology based on Wald's 
maximum likelihood asymptotic to the general retention of effect hypotheses (1.2). This, among 
others, includes the above mentioned situations as special cases. In addition, we obtain tests for 
Poisson distributed endpoints (for careful discussion see Section 2.2). Moreover, we discuss the issue 
of sample size planning and we provide in large generality formulas for optimal allocation of samples 
and accurate approximations for the determination of sample sizes in order to guarantee a certain 
power. We show that this requires the computation of Kullback-Lciblcr divergence minimizcr in 
the null hypothesis to an alternative model. 

Complete test procedure: To ensure assay sensitivity of the test procedure the hypothesis (1.1) 
is typically embedded in a complete test procedure, where in a first step a pretest for superiority 
of either the reference or the test treatment to placebo is performed, and in a second step the non- 
inferiority is investigated via (1.1). There is a vigorous discussion on which pretest is appropriate. 
For example Pigeot et al. (2003) carry out a pretest for superiority of the reference treatment 
to placebo, whereas Koch & Rohmel (2004) perform the test for the test treatment to placebo, 
because the test treatment should not be blamed when the reference treatment could not beat 
placebo (Koch, 2005). 

It is important to note that it turns out as a common rule that the pretest is subordinated in 
the complete test procedure, in terms of that sample size planning can be performed via the non- 
inferiority test without adjustment to the pretest for superiority (see e.g. Mielke et al., 2008). This 
means the power of the non-inferiority test nearly coincides with the power of the complete test 



Mielke & Munk /Evaluating and planning the RET 



3 



procedure for commonly used alternatives. In addition, the pretest represents a well-investigated 
testing problem where the parameters of comparison coincide on the boundary of the hypothesis. 
Thus, we only focus in the following on the non-inferiority hypothesis (1.1) and keep the complete 
test procedure at the back of mind. 

State of research: Closely related to the retention of effect hypothesis (1.1) is the hypothesis 
where the treatment effect 6t — Or is evaluated relative to a historic active control effect Or — Op, 
which could not estimated concurrently, therefore. For a comprehensive discussion we refer to 
Holgrem (1999), Hauck & Anderson (1999), Hasselblad & Kong (2001), Rothmann et al. (2003) 
and Hung, Wang & O'Neill (2009). The most problematic issue of such design is the necessity to 
project the active control effect in the current non-inferiority trial setting (Hung, Wang & O'Neill, 
2009). This issue is not present in the gold standard design, where the active control effect is 
estimated concurrently. 

A nonparametric version of the retention of effect hypothesis (1.1) was already considered by 
Koch & Tangen (1999). Pigeot et al. (2003) consider (1.1) for normally distributed endpoints. Sub- 
sequently, this type of hypothesis was discussed vigorously (see e.g. Hauschke & Pigeot, 2005) and 
investigated for different types of endpoints. Koch & Rohmel (2004) and Schwartz & Denne (2006) 
also consider normally distributed endpoints and investigate (1.1) for k equals the expectation 
/Xjfc of the groups k = T, R, P, respectively, under homogeneity of variance between the groups. 
Haslcr et al. (2008) and Dette et al. (2009) extend these results to the case of heterogeneity of the 
group variances. Mielke et al. (2008) consider censored, exponentially distributed endpoints. Tang 
& Tang (2004) and Kieser & Friede (2007) investigate binary endpoints with 0k equals the success 
probability 7Tfc of each group. In contrast to the normal and exponential case for binary endpoints 
sample size planning leaves open questions. In particular, the existing sample size formulas lack 
in precision, i.e. a deviation between exact and aspired power (cf. Kieser & Friede, 2007). The 
additional difficulties for binary endpoints are mainly due to dependency of the variance on the 
parameters of interest, the success probabilities. In this work we will provide a general approach 
for general parametric models which allows to close this gap for binary endpoints as a special case. 

Content and organization: This paper is organized as follows. In Section 2, we discuss two 
clinical trials. First a trial in the treatment of depressions by investigating if the patients achieve 
remission at the treatment end (binary endpoints) measured by the Hamilton rating scale score of 
depression (HAM-D) and second a study in the treatment of epilepsy by investigating the number 
of seizures (Poisson distributed endpoints). In Section 3, we present the general theory and derive 
a Wald-type test procedure for the generalized retention of effect hypothesis (1.2), which we denote 
as Retention of Effect Wald-type Test (RET) in the following. In Section 4, we derive sample size 
formulas and the (asymptotically) optimal allocation for planning a three-armed retention of effect 
trial. In particular, we include the important case where the variance is estimated restrictedly to 
the null hypothesis. This often improves the asymptotic approximation under the null hypothesis 
(see e.g. Farrington & Manning (1990) and Tang, Tang & Wang (2007)) and therefore is very 
popular in practice. For the presented sample size formulas we have determined the exact limit 
of the restricted ML-estimator, which has never been considered so far. As a major result this 
significantly improves the precision of the formulas, see exemplarily Table 5. The optimal allocation 
when the variance is estimated unrestrictedly turns out to be 

n T : n R : rip = 1 : A : |1 — A| , (1.3) 

CTO.T Co,T 

where ao t k is the variance within group k, k = T, R, P, under the alternative, specified later on 
in (3.6). Here, n* k denotes the number of samples assigned to group k = T,R,P. This is shown 
to be valid in (essentially) any parametric family. Albeit the asymptotic power will change in 
general when the variance is estimated restrictedly, we argue that the optimal allocation remains 
unchanged in an asymptotic sense even when the variance is estimated restrictedly to the null 
hypothesis. As the optimal allocation (1.3) depends on the choice of the alternative we show in 
Section 4.1.1 that one may use the allocation 1 : A : (1 — A) as a very general rule of thumb, 
which is more appropriate in terms of power than the commonly used allocation 2:2:1 as well as 
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the balanced allocation, if <Tq p / Cq y is (roughly) less than 2. It is important to note that this result 
is very general valid, independent of the distribution of the cndpoints and of the formulation of 
the hypothesis (1.2). 

In Section 5, we will revisit our examples introduced in Section 2 to demonstrate and to discuss 
the results of the previous sections in detail. We show that sample size reductions and therewith 
reductions in the costs of a trial with up to 20% and more are possible by reallocating to the 
optimal allocation instead of a balanced or the commonly used 2:2:1 allocation. In particular, it 
turns out that our sample size formula for binary endpoints improves the precision of the existing 
one by Kieser & Friede (2007) significantly in terms of that the exact power is close to the aspired 
one. In Section 6, we briefly comment on i?-software for analysis and planning of the RET, which 
we provide as supplementary material, in order to allow the reader to reproduce the presented 
results and to make the presented methodology directly applicable. Finally, we conclude with a 
discussion in Section 7. 

2. Examples 

In this section, we introduce two clinical non-inferiority trials, one in the treatment of epilepsy 
and the other one in the treatment of depression, and we define retention of effect hypotheses, 
which are of interest within these examples. 

2.1. Binary endpoints: Treatment of depression 

Binomial or binary endpoints, respectively, are most commonly used in non-inferiority trials (Lange 
& Freitag, 2005). In this section we introduce a clinical trial in the treatment of depression from 
Goldstein ct al. (2004), which was also used by Kieser & Friede (2007) for illustration. We will 
find in particular different answers concerning the planning of this study (see Section 5.1). This 
randomized, double-blind trial compares duloxetine (Test treatment) to paroxetine (Reference 
treatment) and Placebo with regard to efficacy and safety. In the therapy of depression, achieving 
remission is the clinically desired goal (Nierenberg & Wright, 1999), whereas remission is defined 
as maintaining the Hamilton rating scale score of depression (HAM-D) total score at < 7. Table 
1 displays for each group, k = T, R, P, the total numbers of patients and the fractions of patients, 
who achieved remission at week 8 (end of treatment). 



Table 1 

Three- armed clinical trial in treatment of depression 



Treatment 


No. of patients 


No. of Patients 
achieved remission 


Fraction of patients 
achieved remission 


Placebo 


88 


26 


29.55% 


Reference 


84 


31 


36.90% 


Test 


86 


13 


50.00% 



For demonstrating that duloxetine is non-inferior to the reference treatment paroxetine, follow- 
ing Kieser & Friede (2007), we consider the retention of effect hypothesis with h(irk) = wic 

H ^ k : 7T T - tt p < A • (tt r - tt p ) , (2.1) 

where 7Tfc represents the remission probability of treatment k = T, R, P at the end of treatment. 

2.2. Poisson endpoints: Treatment of epilepsy 

Typical examples of Poisson distributed endpoints can be found for example in the treatment of 
angina pectoris, nausea and epilepsy, see Layard & Arvesen (1978), where the number of attacks 
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are counted within a specified time interval, or in the treatment of depressions, where the (waiting) 
time until healing or remission is observed (see e.g. Mielke et al., 2008). Here, we reconsider the 
randomized, double blind cross-over trial in the treatment of epilepsy from Sander et al. (1990), 
which compares a new treatment (lamotrigine) as an add-on treatment to a placebo add-on by 
means of 18 patients. Table 2 presents the total number of seizures within the treatment weeks 9- 
12. Note, that Mohanraj & Brodie (2003) highlight that for evaluating anti-cpilcptic drugs (AED) 
as add-on treatment the standard endpoint is the manipulation in the number of seizures. 



Table 2 

Three-armed clinical trial in treatment of epilepsy 









Mean no. of seizures 


Treatment 


No. of Patients 


Total no. of seizures 


per patient 


Placebo add-on 


18 


338 


18.78 


Reference add-on 


18 


295 


16.39 


Test add-on 


18 


288 


16.00 



As AED trials performed in the past are two-armed, either placebo- or active-controlled (for 
an overview see Mohanraj & Brodie, 2003), we add for illustration purposes of our procedures an 
artificial reference treatment group with equal size of 18 patients and seizures of same order of 
magnitude as seizures under the test treatment, also displayed in Table 2. 

We presume that the number of seizures of each patient follows a Poisson distribution deter- 
mined by the group affiliation (T,R,P), i.e. the observations are from Xki, ■ ■ ■ , X^ rik Pois(Xk) 
for k = T,R,P with rip = wr = Ur = 18. Table 2 displays the total number of seizures in 
each group, A fc = X)"=i ^m, k = T,R, P. As in this setting small values of Afc, representing less 
seizures, are desired we choose h(Xk) = —Afc, which yields the retention of effect hypothesis 

H ,-x k ■ Xp-Xt < A • (X P - X R ) (2.2) 
for demonstrating that the test treatment is non-inferior to the reference one. 

2.3. Further examples 

In Table 3 we summarize various endpoints together with some common retention of effect hy- 
potheses. Moreover, we have included some models which have not been used in the context of 
retention of effect hypothesis, including the Wcibull- and Gamma-family. However, these end- 
points arc of practical interest as recent non-inferiority trials by Yakhno et al. (2006) and Gurm 
et al. (2008) highlight. We will not discuss all these situations in detail, but we mention that our 
methodology immediately applies to these situations. 



Table 3 

Survey of retention of effect hypotheses 



Distribution 




h(6 k ) 


oi 


Normal (Pigeot et al., 2003) 


(tJ.k,r 2 ) 




r 2 


Normal (Hasler ct al., 2008) 




l-ik 


-I 


Binary 


1~fc 


~k 




(Kieser & Friede, 2007, this work) 








Binary 




log(7Tfc/(l - TTfc)) 


(Tfe(l "TTfc))- 1 


Exponential (Mielke et al., 2008) 


Afc 


log Afc 


1 


Poisson (this work) 


Afc 


—Afc 


Afc 


Gamma 


A) 


a ■ (3 k [= EXfc] 




Wcibull 


(Afc,/3) 


Afc [= EXfc-(r(l+/3- 1 ))- 1 ] 


/xi (Afc, ^) 
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3. Wald-type test: Theory 



In this section, we derive a Wald-type test procedure for the generalized retention of effect hypoth- 
esis (1.2) introduced in Section 1 and discuss the estimation of the variance with restriction to the 
null hypothesis. This generalizes and unifies specific results for binary, normally and exponentially 
distributed endpoints. Based on this, we provide the theory for sample size planning in the next 
Section 4. 



Model assumptions: Let X k i for i = l,...,n k be independently distributed according to a 
parametric family of distributions with densities {/(#, ■) : £ 0}, C M. d , and parameters Ok £ 
0, k = T,R,P, where T,R and P abbreviates test, reference and placebo group, respectively. 
We presume that the family of probability densities {/(#,•): 6* € 0} is sufficiently regular to 
obtain asymptotic normality of the ML-estimators (MLE) of the parameter with non-singular 
covariance or Fisher-information matrix, respectively, e.g. an exponential family or a family which 
is differentiable in quadratic mean (van der Vaart, 1998). Moreover, none of the groups should 
vanish asymptotically, i.e. for k = T, R, P and n = tit + hr + np 

>w k (3.1) 

n 

holds for nn,nT,np — > oo and some Wk € ]0, 1[, the (asymptotic) proportion of the numbers of 
patients in group k = T,R, P. 



3.1. Retention of Effect Wald-type Test (RET) 



In order to come up with a test for (1.2) we rewrite this as 

H 0Mek) : r) := h(0 T ) - A h{6 R ) + (A - l)h{6 P ) < . 



(3.2) 



The MLE of h(6 k ), k = T,R,P, is obtained by plugging in the MLE Q k of 9 k , which is well- 
defined and asymptotically normally distributed by assumption. By the delta-method this yields 
that y/rik(h(6k) — h(9k)) is centered asymptotically normally distributed with variance 



and I the Fishcr-information-matrix, i.e. 

1(0) = -Eg 



I(0 k 



TTe h ^ 



r pa 



d 2 



log/(0,*) 



Hence, the linear contrast v / "('7 — 7 ?) 5 where the MLE of r\ is obtained by plugging in the MLE's 
Ok, k — R,T,P, in the left hand side of (3.2), is centered asymptotically normal with variance 



Wt 



AV| 



(1-A) 2 ^ 

Wp 



(3.3) 



As we have mentioned in the introduction estimation of a 2 simply by the MLE often leads to an 
unsatisfactory approximation of the asymptotic normal law and various improvements have been 
suggested in specific settings, mainly for the case of binary endpoints (see next section). Therefore, 
we will treat the case of restricted maximum likelihood estimation as well. To this end let a\ lL 
denote the MLE of er 2 and o 2 RML denote the MLE with restriction to the null hypothesis, i.e. the 
MLE of a 2 under the restriction in (3.2). Further let a 2 either denote a\j L or ap ML , see the next 
Section 3.2.1 for a discussion of both estimators. Both estimators are consistent under the null 
hypothesis. Thus, we obtain in order to test (3.2) as a test-statistic 



T 



>n ■ tj 



h(e T )-Ah(§ R ) + {A-i)h(§ P ) 



(3.4) 
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which is asymptotically standard normally distributed at the boundary of Hg^g^, i.e. when r\ = 0. 
Therefore, H oh ^g k ^ is to reject if T > Zi— a , where zi_ Q is the 1 — a-quantile of the standard normal 
distribution and a a specified significance level. Due to the formulation of the hypothesis and the 
test decision we will denote this test by Retention of Effect Wald Test (RET). 

3.2. The estimators of the asymptotic variance er 2 and their limits 

In some situation, e.g. for normally distributed endpoints, it is sufficient to estimate the asymp- 
totic variance in (3.3) by the (unrestricted) MLE (see Pigeot et al., 2003). Roughly speaking, 
this is due to the fact that the asymptotic variance of the test statistic does not depend on the 
parameters h(8k) (in the normal case the mean) which only enter into the hypothesis. However, 
e.g. for the case of binary endpoints the variance depends on the success probabilities itself and an 
improvement in the accuracy of the asymptotic normality can be obtained by estimation restrict- 
edly to the null hypothesis. This has been pointed out by Farrington & Manning (1990) for the 
two sample comparison with binomial endpoints and various improvements have been suggested 
since (see e.g. Chan (1998), Rohmel & Mansmann (1999), Skipka et al. (2004)). For the retention 
of effect hypothesis Kieser & Friedc (2007) demonstrate in an extensive simulation study that the 
restricted Wald-type test (Farrington & Manning's (1990) adjustment) works satisfactorily and 
clearly outperforms the unrestricted Wald-type test concerning the accuracy of the nominal level. 

3.2.1. Computation of & ML and a RML 

Typically, the variance a 2 is a continuous function of the parameters 9k, k = T,R,P. Thus, the 
MLE a ML is obtained by plugging the MLE's 9 k 

o~ml = o-(9t, Or, dp). 

In order to obtain the restricted MLE a RML the Ok's have to be replaced by their restricted 
versions, i.e. 

vrml — o(6t,h ,9r,h -,6p,Ho) 

with 

(0t,h o Jr,h o ,0pm„) = argsup {STdRSp)eHah(i)k) ^ ^ log/(0 fe , x ki ). (3.5) 

k=T,R,P i=l 

The restricted MLEs {9t,h ,8r,Ho^p,h ) can be computed in the following way: if the unre- 
stricted MLEs 9 k , k = T,R, P, are located in H 0M g k) , i.e. h(9 T ) ~ A h(9 R ) + (A - l)h{6 P ) < 0, 
they coincide with the restricted MLEs. Otherwise the restricted MLEs can be determined by 
restricting the likelihood function to the boundary of -£/o./i(0 fc ) by means of substituting 9t = 
h^ 1 (Ah(9]i) + (1 — A)h(9p)) in the common likelihood function (left hand side from (3.5)) and 
maximizing this with respect to 9r and 9p numerically or, if possible, analytically. 

3.2.2. Limits of the variance estimators 

The limits of the MLEs b ML and <rp UL arc crucial for sample size planning in the following 
Section 4. For the derivation of the limits let us denote the true (unknown) parameters by 9 k °\ 
k = T,R, P, and correspondingly ?/ ) = h{9 ( ° ] ) - A h{0$) + (A - l)h(9 ( p ] ) and 

^=(^^))-^£ , V 1 '(^ ) )) T (3-6) 
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for k = R,T,P and 

2 <T , ^<R , (1 - A) 2 < P 

<t q = — ■ 1 1 . (3.7) 

Wt Wr Wp 

The unrestricted MLE o 2 ML is always a consistent estimator, i.e. a\ 1L -^-4 <7q as ti — > oo. However, 
the restricted MLE b RML is only consistent when the true parameters are located in the hypoth- 
esis, i.e. rf^ < 0. In other words, the limit of o 2 RML is no more equal to Cq, in general. We will 
now derive the limit of the restricted MLE o RML , when the parameters are located in the alterna- 
tive, i.e. > 0. This requires computation of the Kullback-Leibler-divergence (KL-divergence) 
between two parameter constellations. To this end, let £ = (9r,9p,9p) denote any parameter in 
the parameter space O 3 C M 3d and the true parameter. Then we define for the three-sample 
case a weighted KL-divergence between and £ with weights c = (ct, cr, cp) by 

K(^\Q,c) = £ c k -K(e[°\9 k ), (3.8) 

k=T,R,P 

where K(9 k ° ] ,9 k ) = E (a) {\ogf(0 k O) ,X)-\ogf(8 k ,X)} denotes the usual KL-diver gence measuring 
the difference between two densities. According to Theorem 2 (see Appendix A.l) the restricted 
MLE (h = {6t,h ,9r,h i6p,h q ) converges to the minimizcr of the sample size weighted KL- 
divergence to the true parameter, i.e. 

C-Ho — 4 Cffn 

with 

Ch = {Qt,h o ,0r,h o ,0p,h o ) = arg min if(C (0) , C, (wr, w R , w P )). 
Therefore, the limit of the restricted MLE a RML is obtained by 

<J% fj A 2 (T| „ (1 - A) 2 <7 2 p rr 

2 1 ,-H . RMq | V / P,Ho (n r.\ 

VRML = 1 1 i' 3 - 9 ) 

wt wr wp 



with 

a k,H 

for k = T,R, P. 



^Wk,H )) -m,*)- 1 - (-^h(e k , Ho ) 



3.2.3. Numerical computation of ctrml 

For computing the minimizers 6^h , k = T,R,P, and therewith o~rml for a parameter constella- 
tion in the alternative, i.e. ?/ ) > 0, it is sufficient to restrict to the boundary of i?o,fc(9k)j i- e - we 
replace in the weighted KL-divergence (3.8) 9t by h~ 1 (Ah(9p) + (l — A)h(9p)) and then minimize 
the KL-divergence with respect to Or and 6p. 

In practice, the analytic solution to the minimization problem of the KL-divergence may be 
hard (confer the example of Poisson endpoints in Section 5.2.3) or even unfeasible to find. In 
this case, numerical minimization becomes necessary. To this end, it is important to note that 
the minimization of the KL-divcrgence often results in a convex optimization problem and fast 
algorithms for convex optimization , such as the Newton-Raphson algorithm, become feasible. The 
following theorem states conditions to obtain a convex optimization problem. 

Theorem 1: Let — -Eg(o) log/(#, X)] be non-negative for all 9 G and 9 k , k — T,R,P and 

let be a convex set. Further, let h^ 1 (Ah(9p) + (1 — A)h(9p)) be an afjftne transformation in 
9p and 9p. Then, restricted to the boundary of the null hypothesis, the minimization in £ of the 
weighted KL-divergence (3.8) is a convex optimization problem. 

The conditions of Theorem 1 are fulfilled in our examples of Poisson and binary endpoints, 
which will be revisited in Section 5. 
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3.3. Approximating the power function of the RET 

The asymptotic normality used in Section 3.1 to derive the RET is valid for parameter constella- 
tions in the hypothesis as well as for constellations in the alternative. Thus, if the variance a 2 is 
estimated unrestrictedly, a 2 = a\ 1L , we obtain as an approximation to the power function of the 
RET, i.e. the probability of rejecting the hypothesis H ^ 6k ^ in (1.2), by 



( V { °> \ 

P nm (T > Zi-a) « 1 - $ ( zi_ Q - ^fn — — J 



(o)- 

(3.11) 



However, estimating the variance a 2 restricted to the null hypothesis, i.e. a 2 = <y\ UL , complicates 
the issue and changes the power function to 

o trp . x o (rp VRML r- V m ^ VRML r- ?? (0) 
P. (0) (i > Zl-a) = P„(o) 1 ■ y/n > z l-a ■ V n 

« 1-*^.^-^^) • (3.12) 
V °"o ^0 / 

Note, that (3.11) can be obtained from (3.12) by means of substituting urml by ctq. 



4. Sample size formula and optimal allocation of samples 

In this section, we present a sample size formula for the test of the generalized retention of effect 
hypothesis i?o,/i(e fc ) (1-2) introduced in Section 1. In particular, we derive the optimal allocation 
of the samples to the groups T, R and P in terms of maximizing the power of the RET under any 
fixed alternative r/^ . 

4-1. Optimal sample allocation 

In planning a trial, one typically specifies a parameter constellation ?/ ' in the alternative. Our 
aim in this section is to optimize the allocation of samples, represented through Wk, k = T, R, P, 
as in (3.1), such that the power of the test decision in (3.11) or (3.12), respectively, is maximized. 
The power depends on the allocation through and <j\ ml . 

When the variance a 2 is estimated unrestricted in the test procedure (a 2 = o\ 1L ) we only have 
to consider a 2 to investigate the influence of the allocation on the power, confer (3.11). This means 
that we have to minimize a" 2 in order to maximize the power. By straight forward calculations, 
presented in Appendix A. 3, we obtain as major and general valid result that the (asymptotically) 
optimal allocation of samples for the RET is given by 

n T : n R : n P = 1 : A — — : |1 — A| — — . (4-1) 
0"o, t 0"o, T 

The resulting optimal minimal variance is given by 

a 0,opUrnal = ( a 0,T + Acr ,« + |1 ~ ^Wo,pf ■ 

Remark: For the specific case of normal endpoints with equal variances (Pigeot et al. (2003), 
Schwartz & Denne (2006)) and exponentially distributed endpoints (Mielke et al., 2008) we obtain 
the optimal allocation 1 : A : 1 1 — A | , again. 

When the variance a 2 is estimated under restriction to H ^g k ^ the asymptotic power in (3.12) 
depends additionally on vrml/co because under any alternative the restricted estimator orml 
is not a consistent estimator for oo- Nevertheless, the asymptotically optimal allocation derived 
for the unrestricted case is again optimal in an asymptotic sense because the power in (3.12) is 
dominated by the term ^/n ■ 77(0) / 'ctq as n grows. Hence, the allocation (4.1) derived in the previous 
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section, which minimizes the variance ao, is also the (asymptotically) optimal allocation in terms 
of maximizing the power when the variance a 2 is estimated restricted to H o h ^ k y 

Remark: (a) We would like to stress that this result can be applied to the case of binary endpoints 
(see Section 5.1.2). This leads to different results as in Kieser & Friede (2007), who derived the 
optimal allocation under the additional restriction of a fixed ratio wr/wt- 

(b) The asymptotically optimal allocation presented in (4.1) should be understood as approxima- 
tive for finite samples as it is customary for asymptotic results. Nevertheless, for the presented 
examples in this paper we will show in Section 5 that the optimal allocation is also very accu- 
rate for finite samples, e.g. for a power of 80%. However, one should be aware of the fact that, 
in particular for small sample sizes, it is not guaranteed that the allocation (4.1) is optimal, in 
general. 



4..1.1. Rule of thumb 

The asymptotically optimal sample allocation (4.1) depends on the choice of the alternative > 
0. If one is not clear about the choice of the alternative or wants to consider more than one 
alternative, we recommend to use as a rule of thumb the allocation 1 : A : (1 — A). We will show 
for Ojp = 0j? and < A < 1 in the Appendix A. 4 that the allocation 1 : A : (1 — A) is more 
appropriate than the commonly used 2:2:1 allocation (the balanced allocation) if a 2 P /a^ T < - 
2.12(2.73). Note, that a lower bound for a 2 P /a 2 T is not required. Moreover, this result is valid 
independent of the distribution of the endpoints and of the formulation of the retention of effect 
hypothesis. 



4-2. Sample size computation 

When the variance a 2 is estimated unrestrictedly {a 2 = a 2 ML ) we end up with the simplified power 
formula (3.11). Thus, the minimal required total sample size to obtain a power of 1 — (3 for a given 
alternative t/ ' > is determined by 

77,1-/3 ~ {zi- a -Yzx-pf ■ (j^J ( 4 - 2 ) 

with <To defined in (3.7). When the variance a 2 is estimated restricted to the null hypothesis 
(a 2 = a RML ) the sample size formula has to be derived from (3.12) and becomes more involved, 
viz. 

{ Zl-a ■ VRML + Zl-0 ■ 0-q\ 2 ( VRML . \ 2 / a \ 2 Q s 

{ ^ ) ={ z i-°'- — + z i-e) ■[^) > ^ 

with <jrml derived in (3.9). As we will see the additional term vrml/po nas a relevant impact 
on the sample size planning. 

In Figure 1 we have summarized the general strategy for sample size planning (GSSP) for the 
RET when the variance a 2 is estimated with restriction to the null hypothesis. When the variance 
a 2 is estimated unrestrictedly by 9ml we may omit the steps 2.-4. in Figure 1 and use the simpler 
formula (4.2) in step 5. to compute the required sample size n\—p. 

Remark: We stress again that the use of grml will affect the planning of the trial significantly. 
If one replaces in (4.3) <jrml by ao this may result in a too small or too large required sample 
size depending on the ratio <jrml /oq. If the ratio <jrml/ctq is greater (smaller) than one, then we 
end up with a too small (large) required sample size, i.e. the resulting power is smaller (larger) 
than the desired power 1 — 0. For example, this will be the case for Poisson distributed endpoints 
(see Section 2.2) and the hypothesis (2.2). We will see in Section 5.2 that <trml/&o > 1 for all 
parameter constellations. In contrast, for binary endpoints (see Section 2.1) and the hypothesis 
(2.1), we will show in Section 5.1 that there is no strict relationship between orml and ao. Thus, 
a wrongly specified sample size may result in a too large or too small power compared to the 
aspired one. 



Mielke & Munk /Evaluating and planning the RET 



11 



General strategy for sample size planning (GSSP) 

Input: h(-) Measure of efficacy 



q(0) a (0) AO) 
T ' R ' P 



Parameter constellation in the alternative, r/°) > 0. 
u>t '■ uir '■ wp Allocation of samples 
A Non-inferiority margin 
a Significance level 
1 — P Aspired power 

Procedure: 

1. Compute rjCO) = h(6^ 0) ) - A h(e^ } ) + (A - l)/i(6»^ >) ). 

2. Compute (Tq via (3.7) 

O-Q = 1 1 

W T W R W P 

with a%,,k = T, R, P, from (3.6) 



T 0,fc 



3. Determine the weighted KL-divergence (3.8) for the endpoint of investigation. 

4. Compute the parameter constellation d k Ho , k = T, R, P, in the null hypothesis, which 
minimizes the weighted KL-divergence to the true parameter. This can be done ana- 
lytically or numerically (confer Section 3.2.3). 

5. Compute 

2 _ "T,H , " "R,H 



'RML — 1 1 

W'p Wr Wp 



for 0k,H o , k = T,R,P, via (3.9) and (3.10). 

Use formula (4.3) to compute the minimal total required sample size 



"1-/3 



' 21_ Q ■ VRML + Z1-/3 ■ CO x ' 



'/ 



(0) 



Fig 1. General strategy for sample size planning (GSSP) when the variance a 2 is estimated with restriction to the 
null hypothesis. 



5. Examples revisited 

In the following wc will perform the RET for the examples from Section 2 and we will illustrate the 
general strategy for sample size planning (GSSP) including a detailed investigation of the optimal 
allocation. 



5.1. Binary endpoints: Treatment of depression 

In this section, we revisit the example in the treatment of depression introduced in Section 2.1. 



5.1.1. Performing the RET 

For the sake of completeness we recall the RET for the situation h(iTk) = TTk, which was already in- 
troduced by Tang & Tang (2004) and Kieser & Friede (2007). The MLE of ir k is ir k = n^ 1 Y%=i x ki 
which is asymptotically normally distributed with variance o~\ — 7Tfe(l— 7rjt). Hence, the unrestricted 
MLE of the variance a 2 is given by (cf. (3.3)), 

<?ml = n • + A + 1 - A) 

V n T n R n P J 
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""0,-R n 0,R 

Fig 2. Example of binary distributed endpoints: Sample size reduction in % when optimal allocation (5.3) is used 
instead of the balanced allocation (right figure) and instead of the allocation 2:2:1 (left figure) for ttq^p = 0.1 and 
different values A = 0.5 (dotted line), A = 0.6 (dashed line), A = 0.7 (dotdash line), A = 0.8 (solid line). 



and we end up with the test statistic (see (3.4)) 



TTt — A TTr + (A — l)7Tp 



(5-1) 



in order to test H^^ k in (2.1), which is rejected if T > z\- a . 

Let us now consider the case where a 1 is estimated restrictedly (cf. Farrington & Manning. 
1990). The restricted version of the Wald-type test is observed by replacing the MLEs 7Tfc in 
the denominator by the to Ho^ k restricted ones. Here, we have computed the restricted MLEs 
accordingly to Section 3.2.1 by means of substituting ttt = A7r/j + (1 — A)7rp in the common 
likelihood function and maximizing this with respect to ttr and -Kp numerically. Note, that in 
contrast to the two-sample case (Farrington & Manning, 1990), an analytical computation of the 
restricted MLE's is not feasible, anymore. 

The RET for the hypothesis (2.1) with A = 0.8 yields T = 2.104 (2.108) in (5.1) using the 
restricted (unrestricted) estimator for the variance estimation and corresponding p- values 1.77% 
(1.75%). Thus, we would reject Ho,e k from (2.1) in both cases and claim that the test treatment 
duloxetine is non-inferior to paroxetine. 



5.1.2. Optimal allocation 

For binary distributed endpoints and the hypothesis (2.1) the optimal allocation of samples is 
given by 



Tin-' • it D • Th I 



1;A U,r(} -*o,r) ;|1 _ a| 



7T0,p(1 — 7i"0,p) 



tto,t(1 — 7To,t) y 7To,r(l — 7T0,r) 

according to (4.1). For the commonly used alternative itq^r = ttq,t the allocation simplifies to 



(5.2) 



1 : A : ll-AI 



7To,x(l - 7T0,t) 



(5.3) 



In contrast to the case of normally distributed endpoints, where the optimal allocation is given 
by 1 : A : |1 — A| (cf. Pigeot et al., 2003), the optimal allocation depends on the parameter of 
investigation. 
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Table 4 

Example of binomial distributed endpoints: Optimal sample allocation, limit of variance estimator orml and 
required samples size from formula (4-3) and (4-2), respectively, to obtain a power of 0.7 and 0.8, respectively, 
when the variance a 1 is estimated restrictedly to the null-hypothesis (unrestrictedly) , where a = 5%, A = 0.7. 





1^0, T 




» 

W R 


* 
p 


VRML 


Optimal allocation 

no. 7 "0.8 


a RML 


2:2:1 allocation 

no. 7 no. 8 


0.1 


0.3 


0.527 


0.369 


0.104 


0.994 


997 (988) 


1308 (1297) 


1.014 


1054 (1076) 


1388 (1414) 




0.5 


0.532 


0.372 


0.096 


0.986 


296 (289) 


387 (380) 


1.006 


315 (318) 


415 (418) 




0.7 


0.527 


0.369 


0.104 


0.955 


118 (110) 


154 (145) 


0.965 


127 (120) 


165 (158) 




0.9 


0.500 


0.350 


0.150 


0.791 


43 (30) 


54 (39) 


0.759 


48 (31) 


60 (41) 


0.3 


0.5 


0.506 


0.354 


0.139 


0.998 


1279 (1275) 


1680 (1675) 


1.012 


1341 (1341) 


1761 (1762) 




0.7 


0.500 


0.350 


0.150 


0.986 


281 (275) 


368 (361) 


0.975 


298 (287) 


390 (377) 




0.9 


0.463 


0.324 


0.212 


0.867 


76 (61) 


98 (81) 


0.830 


84 (63) 


106 (83) 


0.5 


0.7 


0.493 


0.345 


0.161 


0.997 


1134 (1129) 


1489 (1483) 


0.988 


1191 (1170) 


1561 (1537) 




0.9 


0.455 


0.318 


0.227 


0.924 


161 (143) 


209 (188) 


0.894 


174 (147) 


224 (193) 


0.7 


0.8 


0.489 


0.343 


0.168 


0.998 


3505 (3495) 


4603 (4591) 


0.989 


3672 (3611) 


4814 (4744) 




0.9 


0.463 


0.324 


0.212 


0.974 


571 (549) 


746 (721) 


0.949 


609 (562) 


792 (739) 


0.8 


0.9 


0.476 


0.333 


0.190 


0.992 


2101 (2076) 


2756 (2727) 


0.975 


2214 (2130) 


2895 (2798) 



Kieser & Friede (2007) derived the optimal allocation under the additional constraint that the 
test and reference group are balanced, n T = n* R . Our result (5.2) shows that this restriction does 
not lead to an approximative optimal allocation, in general. Exemplary, Kieser & Friede (2007) 
derive that the allocation 2.1 : 2.1 : 1 would be optimal for ttr = 0.1, ttt = ttr = 0.9 and A = 0.6, 
whereas (5.3) yields an optimal allocation of 2.5 : 1.5 : 1, giving more weight to the test group 
relative to the reference group and nearly the same to the placebo group. The allocation 2.5 : 1.5 : 1 
and the allocation 2.1 : 2.1 : 1 result in a total required sample size of 79 and 89, respectively, when 
a power 1 — j3 of 80% is desired. Thus, our optimal allocation makes a further reduction of total 
sample size of about 12% possible in this specific setting. The sample size reductions which are 
possible in other settings are illustrated in Figure 2, where the reduction for the optimal allocation 
instead of a balanced and a 2:2:1 allocation, respectively, is presented for ttq^p = 0.1 and different 
values of A, exemplary. For the 2:2:1 allocation we observe reductions between about 3% and 10%. 
For the balanced allocation there are reductions up to 20% and more possible. Thus, the 2:2:1 
allocation is more apporiate than the balanced allocation. However, it can be further improved by 
the optimal one (5.3). 



5.1.3. Planning a trial - applying the GSSP 



For binary distributed endpoints the weighted KL-divergence is given by 



(0) 

w k ■ ir k 



(o) i (o) ' 

log + (!_*£>). log. 1 "** 



k=T,R,P 



1 - TTfc 



(5.4) 



(tt^ , Ttjp , 7Tp ^). We restrict our investigations in the following to 

(0) _ JO) 

R 



with £ = (ttt, 7Tr, 7rp) and <^°) 

the commonly used alternative ir^' = ir^' . To restrict the minimization problem of the weighted 
KL-divcrgcncc to Ho^ k (2.1) we substitute ttt = A7Tr + (1 — A)7Tp in (5.4). We have minimized 
the KL-divergence (5.4) in 7Tr and irp by the Newton-Raphson algorithm, confer Section 3.2.3. 
Note, that this is a strictly convex optimization problem by Theorem 1 because 



-E 



d 2 



r (0) 



r (0) 



> 



for any tt and 7r[° . This guarantees the existence of a unique minimizer and geometric convergence 
of the Newton-Raphson algorithm. Based on the obtained results the limit <J RML of the restricted 
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Table 5 

Precision of sample size formula (4-3) and comparison to the results obtained by Kieser & Friede (2007) for a 

aspired power of 80% at significance level a = 2.5%. 



Wt '■ wr : wp A 


7T0,P 




n 


c Friede (2007) 
Exact Power 


Usage of Eq. (4.3) from this 
work with exact limit o~rml 

n Exact Power 


1:1:1 0.6 


0.1 


0.5 


309 


78.94% 




an nfl ^. 




0.1 


0.7 


135 


81.51% 


132 


80.77% 




0.1 


0.9 


54 


83.05% 


53 


80 49% 




0.3 


0.7 


318 


81.17% 


312 


80.45% 




0.3 


0.9 


99 


83.92% 


94 


81.52% 




0.5 


0.9 


213 


84.95% 


195 


81.43% 


0.8 


0.1 


0.7 


606 


81.74% 


583 


80 18% 




0.1 


0.9 


201 


85.57% 


182 


81.14% 




0.3 


0.9 


345 


85.39% 


309 


81 08% 




0.5 


0.9 


726 


84.74% 


653 


80 51% 


2:2:1 0.6 


0.1 


0.5 


270 


78.59% 


Zoo 


oU.oO/o 




0.1 


0.7 


115 


79.96% 


119 


80.62% 




0.1 


0.9 


50 


84.71% 


49 


80.71% 




0.3 


0.7 


290 


80.73% 


287 


80.02% 




0.3 


0.9 


95 


84.25% 


89 


80.82% 




0.5 


0.9 


213 


86.06% 


186 


81.11% 


0.8 


0.1 


0.7 


510 


81.69% 


492 


80.15% 




0.1 


0.9 


170 


85.42% 


156 


81.99% 




0.3 


0.9 


300 


85.51% 


269 


81.09% 




U.u 


n q 


UOO 


oi.uy /o 


575 


80.88% 


3:2:1 0.6 


0.1 


0.5 


252 


78.15% 


268 


80.49% 




0.1 


0.7 


108 


80.54% 


110 


81.05% 




0.1 


0.9 


42 


80.12% 


45 


83.09% 




0.3 


0.7 


276 


80.97% 


272 


80.31% 




0.3 


0.9 


90 


85.70% 


83 


81.07% 




0.5 


0.9 


204 


87.31% 


173 


80.65% 


0.8 


0.1 


0.7 


486 


82.51% 


458 


80.17% 




0.1 


0.9 


156 


87.36% 


135 


81.75% 




0.3 


0.9 


282 


87.17% 


241 


81.21% 




0.5 


0.9 


606 


86.02% 


520 


80.30% 



MLE's of the variance is computed and compared to the true variance <7q, see Table 4, columns 6 
and 9. We used throughout Table 4 a choice of A = 0.7, exemplary. 

We may use (4.2) and (4.3), respectively, to compute the total required sample sizes. The results 
are also displayed in Table 4 for a power 1 — /3 of 0.7 and 0.8, respectively, for the optimal allocation, 
displayed in the columns 3-5 of Table 4, and the commonly used 2:2:1 allocation for the purpose 
of illustrating the influence of allocation on the total required sample size. The sample size values 
in brackets are determined by (4.2), i.e. the RET is performed with unrestricted estimation of 
variance a 1 = ar\ IL , and the values in front without brackets are determined by (4.3), i.e. the 
RET is performed with restricted estimation of variance a 1 = o\ ML - For large sample sizes the 
differences between both values are relatively small, whereas for small to moderate sample sizes 
(n < 200) the differences are more pronounced. The amount of difference is driven by the difference 
between orml and <jq, see again Table 4, column 6 and 9. 

It is important to note, that these results differ from those obtained by Kieser & Friede (2007). 
This is due to the fact, that for the computation of <j\ ml wc have used the limit of the restricted 
MLE o 2 RML instead of choosing an arbitrary parameter constellation on the boundary of HQ /Kk . 
We will see that the usage of the exact limit o\ Mh improves significantly the precision of the 
sample size formula (4.3). To this end, we have determined the required total sample size n via 
(4.3) with usage of the exact limit a\ ML to obtain a power of 80% at level a = 2.5% (in order 
to be comparable with the results obtained by Kieser & Friede (2007)) for different parameter 
settings and allocations and thereafter we have computed the resulting exact power (see Table 
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% 



0.5 0.6 0.7 



0.4 0.5 0.6 0.7 

Ao,.rAo,p 



Fig 3. Example of Poisson distributed endpoints: Sample size reduction in % when optimal allocation is used 
instead of the balanced allocation (right figure) and instead of the allocation 2:2:1 (left figure) for different values 
of A, A = 0.5 (dotted line), A = 0.6 (dashed line), A = 0.7 (dotdash line), A = 0.8 (solid line). 



5). Note, that we always have rounded down the group sample sizes rife, k = T,R,P. The results 
obtained by Kicser & Friede (2007), who have not used the exact limit cr RML: are displayed for 
comparison. Kieser & Friede (2007) obtain an exact power that increases to 85% or even to 87% 
for some settings although n > 200. Whereas the power decreases up to 78% for other settings. 
In contrast, our method results in power values between 80% and 82% for all settings (with one 
exception for the case wt ■ wp : wp = 3 : 2 : 1, A = 0.6, 7To,p = 0.1 and iro,R = 0.9 due to the 
small total sample size of 45). In summary, wc find that our approximative formula yields very 
satisfactory results over a broad range of scenarios. 

5.2. Poisson endpoints: Treatment of epilepsy 

In this section, we revisit the example in the treatment of epilepsy introduced in Section 2.2. 



5.2.1. Performing the RET 

The MLE Afe is obtained by the mean value 1 X)"=i %kit which is asymptotically normally 
distributed with variance a\ = Xk ■ The unrestricted MLE of the variance a 2 is obtained by 



^2 




riR 



(1-A 



2 Ap 
Tip 



Hence, we end up with the test statistic (see (3.4)) 



T = 



-\ T + A\ R + {1- A)A P 

AH + + h- A)2Ap 



(5.5) 



in order to test H ,-\ k from (2.2), where Ho,-\ k is rejected if T > z\- a . The restricted version 
of the Wald-type test is observed by replacing the MLEs in the denominator by the to -ffo,-A fc 
restricted ones. Again, we have computed the restricted MLEs numerically as for binary endpoints 
in the previous section. 

The RET for the hypothesis (2.2) with A = 0.5 yields T = 1.328 (1.349) in (5.5) using the 
restricted (unrestricted) estimator for the variance estimation and corresponding p-values 9.21% 
(8.86%). Thus, we would not reject Ho,-\ k from (2.2) at level a = 0.05 and we could not claim 
that the test treatment is non-inferior to the reference one. 
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5.2.2. Optimal allocation 



Table 6 

Optimal allocation of samples for the example of Poisson distributed endpoints 







A = 0.5 






A = 0.7 






A = 0.8 




X 0,T _ ^0,R 
^0,P ^Q.P 




W R 


VJ*p 




10* 






W R 




0.9 


0.49 


0.25 


0.26 


0.50 


0.35 


0.16 


0.50 


0.40 


0.10 


0.8 


0.49 


0.24 


0.27 


0.49 


0.34 


0.16 


0.49 


0.40 


0.11 


0.7 


0.48 


0.24 


0.28 


0.49 


0.34 


0.17 


0.49 


0.39 


0.12 


0.6 


0.47 


0.23 


0.30 


0.48 


0.34 


0.19 


0.49 


0.39 


0.13 


0.5 


0.45 


0.23 


0.32 


0.47 


0.33 


0.20 


0.48 


0.38 


0.14 


0.3 


0.41 


0.21 


0.38 


0.44 


0.31 


0.24 


0.46 


0.37 


0.17 


0.2 


0.38 


0.19 


0.43 


0.42 


0.30 


0.28 


0.44 


0.36 


0.30 



For Poisson distributed endpoints and the hypothesis (2.2) the optimal allocation of samples is 
given by 



TL^p '. Tl p '. Tip 

Table 6 presents the optimal allocation for the commonly used alternative Ao.t = Xq,r f° r different 
choices of Xo,t/^o,p = Xq.r/^o.p an( l A. Note, that we may assume w.l.o.g Ao,p = 1 because 
multiplication of all parameters Ao,fc, k = T,R 1 P, by the same factor does not change the optimal 
allocation. This simplifies computation significantly. The sample size reductions which are possible 
are illustrated in Figure 3 where the reduction for using the optimal allocation instead of a balanced 
and a 2:2:1 allocation, respectively, is presented for different values of A. The results are quite 
similar to the ones for binary endpoints in the previous section. 

Table 7 

Example of Poisson distributed endpoints: Limits of restricted MLE's, limit of variance estimator Sjmi and 
required samples size to obtain a power of 0.7 and 0.8, respectively, when the variance is estimated restrictedly to 
the null-hypothesis (unrestrictedly), a nominal significance level a = 5%, for different parameter constellations 
and choices of A for the optimal sample allocation in (5.6). 



A 


^Q,T ^O.R 

X 0,P ^0,P 


X T,H 
X 0,P 




X P.H 
X Q,P 


VRML 
^O.P 


"0 
X 0,P 


VRML 


"0.7 ■ A ,p 


"0.8 ■ ^0,P 


0.5 


0.7 


0.78 


0.64 


0.92 


1.763 


1.755 


1.005 


649 (645) 


852 (847) 




0.5 


0.64 


0.41 


0.87 


1.594 


1.561 


1.021 


190 (184) 


248 (241) 




0.3 


0.51 


0.21 


0.81 


1.426 


1.322 


1.079 


76 (68) 


98 (89) 


0.7 


0.7 


0.75 


0.66 


0.95 


1.726 


1.722 


1.002 


1729 (1724) 


2270 (2265) 




0.5 


0.58 


0.44 


0.91 


1.515 


1.502 


1.009 


479 (472) 


628 (620) 




0.3 


0.42 


0.23 


0.86 


1.278 


1.231 


1.038 


172 (162) 


224 (213) 


0.8 


0.7 


0.73 


0.67 


0.97 


1.707 


1.706 


1.001 


3810 (3805) 


5004 (4999) 




0.5 


0.55 


0.46 


0.94 


1.479 


1.473 


1.004 


1028 (1021) 


1349 (1342) 




0.3 


0.38 


0.25 


0.90 


1.210 


1.186 


1.020 


348 (338) 


456 (444) 



1 : A 



' Aq,_r 




(5.6) 



5.2.3. Planning a trial - applying the GSSP 

For Poisson distributed endpoints the weighted KL-divergence is given by 

K({M,(,w)= £ Wfc .(A fc -Ai 0) +Ai 0) .(logAl 0) -logA fc )) (5.7) 



k=T,R,P 



with C = (A T , Xr, A P ) and C (0) = (X^ \Xp ,X P ). In the following we restrict our investigations to 
the commonly used alternative A^ = A^ . To restrict the minimization problem of the weighted 
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KL-divergence to the boundary of Ho t -\ k (2.2) we substitute At = AAp + (1 — A)Ap in (5.7). 
For this situation, an explicit minimization of the KL-divergence is possible. To this end, we 
evaluate the derivatives of K w.r.t. Ap and Ap at zero which is extremely cumbersome and yields 
a rather complex solution (see A. 5). The KL-divergence minimizer over Ho,-A fc! denoted by \k,H i 
k = T,R, P, are displayed in Table 7 (columns 3-5) for different parameter constellations and 
choices of A. Based on these results the limit cr RML of the restricted MLE's of the variance is 
computed (column 6) and compared to the true variance a 2 , see Table 7 columns 7 and 8. We 
presumed throughout Table 7 the usage of the optimal allocation from Table 6. In addition, for 
all parameter constellations the required total samples size no. 7, no. s to obtain a power of 0.7 and 
0.8, respectively, are computed via (4.2) (values in brackets) and (4.3), respectively. Note, that 
ni-fj ■ Ao.p is displayed in Table 7 and thus the displayed values have to be divided by Ao.p to 
obtain the required total sample sizes. 

6. Software 

We provide the R source code of functions and documentation for planning and analyzing the 
RET for various endpoints as supplementary material (File: RET.Package.pdf). This covers bi- 
nary (Section 5.1), Poisson (Section 5.2), normally (Pigeot et al., 2003) and censored, exponentially 
distributed endpoints (Mielke et al., 2008). All provided functions have the following common 
structure: 



where 'xx' specifies the distribution of the endpoints and 'yy' the retention of effect hypothesis. 
7. Discussion 

In this paper, we have presented a full analysis and planning of three-armed trials for general 
retention of effect hypotheses. The endpoint of interest may follow any (regular) parametric dis- 
tribution family. As a major result, we have derived the asymptotically optimal allocation, see 
Equation (4.1), and sample size formulas for planning the trial (4.2) and (4.3) for restricted as 
well as unrestricted estimation of the variance. To this end, the crucial step was the determination 
of the exact limit <J 2 RML of the restricted MLE of the variance a 2 , which was not investigated 
and incorporated in this context so far to our knowledge. As a consequence, note that for plan- 
ning a non-inferiority trial it is important to decide in advance which estimation method will be 
performed as it affects the power and hence the total number of samples required. 

For binomial endpoints this improves on existing procedures. This includes the precision of the 
sample size formula as well as the issue of optimal allocation. The optimal allocation reduces the 
total sample size by amounts up to 10% (20%) compared to the 2:2:1 (balanced) allocation. In 
addition, the methods of this paper are applied to Poisson endpoints, which were not investigated 
in the context of three-armed non-inferiority trials so far to our knowledge. 

A problematic issue might be that the sample size planning and evaluation of a study presented 
in this paper is based on asymptotically considerations. Thus, for finite samples the optimal 
allocation could differ. In both examples investigated in this paper this is not the case, at least 
numerical studies show that the differences are irrelevantly small. However, differences could occur 
for example when the ratio (trml/gq is far away from 1 and the signal to noise ratio ry(0)/<ro is 
very small. 



RET.xx.yy( ) 
RET.xx.yy.OptAlloc( ) 
RET .xx.yy .Samplesize( ) 



Performs the RET for given data 



Determines the required sample sizes for the RET 



Computes the optimal sample allocation for the RET 
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Appendix A 

A . 1 . Limit of the restricted MLE 

Assumption 1: For in the alternative Hi and n^/n — ► Wk <E]0, 1[, w — (wt,wr,wp), the 
minimum ( Ho = (6t,h o ,0r,h o ,0p,h o ) = argmin Ceff(1 K (C (0) , C, w) is well-defined. 

Assumption 2: For any sequence = (9^,8^,9^) in Hq with limn-^ m @ 3 \ © 3 ° r 
with lim„_ >o0 || C (n) ||= oo 

hm TT f(6 k n) ,x k ) = 

fc=T,fl,P 

holds ' almost everywhere. 

The next theorem shows that the restricted MLE converges to the minimizer of the sample 
size weighted Kullback-Leibler-divergence (KL-divergence) with respect to the true parameter, 
denoted by 9 k ^n > k = T 7 R,P. 

Theorem 2: Let denote the MLE restricted to Hq. Then under the Assumptions 1 and 2 

e° ^ c Ho . 

Proof. Let 

Qn(0 = - E -J2^gf(X kl ,9 k ) 

/, r. /,'./■ , i 

and 

Q(0 = - E w fe -^( 0) [iog/(X fe i ) ^)]. 

k=T,R,P 

Note, that by definition 

A"(C (0) ,C^)=Q(C)-Q(C (0) ) 

holds and consequently Ch = ar g mm ce-Ho -^(C i C w ) is a l so the well-defined minimizer of Q(C) 
in H Q . 

Assumption 2 ensures that the MLE is asymptotically almost surely located in a compact set, 
i.e. there exists compact subset Hq such that 

hm &° = lim &° a.s. 

n — >oo n—>oo 

A proof for lim n ^ 00 || ||= oo can be found in Wald (1949). However, for lim^-^oo in 

e \ e 3 this can be proved analogously. Hence, we assume w.l.o.g. that Hq is compact. Therefore, 
the convergence 

Qn (C) ^ Q(0 

is uniformly in Hq (see Jennrich, 1969, Theorem 2) and we can apply Lemma 2.2 from White 
(1980), which yields that = argmiii£ 6 # Q n (() converges almost surely to the well-defined 
minimum of Q(C) m Hq. rj 

A. 2. Proof of Theorem 1 

The condition — E g (o) [-^rg log f(9, X)] ensures that the KL-divergence K(9 k °\9) is a convex func- 
tion in 9 for 6 k °\ k = T,R, P. Thus, the weighted KL-divcrgcncc 

K(C {0 \(9t,8 R: 8 p ) : c) = E c k -K{9f\9 k ) 

k=T,R,P 
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is convex in the arguments 9k, k = T, R, P. Let us denote 

g(6 R , Bp) = h-\Ah(9 R ) + (1 - A)h(9 P )), 
which is an affine transformation in both arguments by assumption. Hence, 

K(O l °\g{0 R ,O P )) 

is a convex function in (9 R ,9p). Therefore, the weighted KL-divergence with restriction to the 
boundary of the null hypothesis ffo,/i(e fc ) represented by 

K(C w ,(g(9 R , 9 P ),9 R , 9 P ),c) = c T ■ K(9 i °\g(9 R , 9 P )) + c R ■ K{9 { ° ) ,9 R ) + c P ■ K{6$\B p ) 
is a linear combination of convex function and therewith convex in {9 R , 9 P ), again. 

□ 

A. 3. Minimization of as a function of sample allocation 

As the sample allocation has to fulfill wp + w R + w R = 1 we substitute wp = 1 — wp — w R in (3.7) 
and obtain 

2 <t , A 2 ■ a 2 M (1 - A) 2 ■ al P 
cr = 1 — + . 

Wp w R 1 — Wp — W R 

Note, that Cq is convex function in (wp,w R ). Evaluating the derivatives of cr 2 . w.r.t. wp and w R 
at zero yields 

d a2 (1 - A) 2 ■ ajp _<t =q 

dwp (1 — w R — wp) 2 

d 2 (l-A) 2 -a 2 P A 2 -a 2 T 



dw R (1 — w R — wp) 2 w^ 

Solving the equations for wp and w R yields the minimizer 

o~o,t 



and therewith 



w T 
* 

w P 



O~0,T " 


f A • a , R 


+ |1- 


A| 






A- 








00, T " 


f A • a ,R 


+ 11- 


A| 


• C0,F 




|1-A 








0o,t 4 


- A • CT ,P. 


+ 11- 


A| 


' C0,P 



Thus, the optimal allocation in terms of minimizing the variance a 2 is given by 

o~o,R H A I ^o.p 



: n n : n P = : w n : tu, 



1 : A-^ : |1-A| 

C0,T °0,T 



□ 



A. 4- Comparison of the variance cr^ for different allocations 

Theorem 3: If9p = 9^ ond a 2 , p/o~q T < 2.12 then the allocation 1 : A : (1 — A) results in a 
smaller variance a 2 , (3.7) (and hence larger asymptotic power) than the allocation 2:2:1 for any 
< A < 1. 
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Proof. Substituting the allocation 2:2:1 and 1 : A : (1 — A), respectively, and 9jp = 6^' in the 
variance <7q from (3.7) yields 

2 5 + 5A 2 r /-, a \2 

a 2:2:l = rj a °< T + ^ ~~ ' a °' P 

and 

*i:A.-i-A = 2(1 + A) o\ T + 2(1 - A) al P . 
Thus, we obtain with r := Cg p/cq t > 

2 _ 2 

g (A,r) := g2:2:1 = (2.5 + 5 t) A 2 + (-2 - 8 • r) A + (0.5 + 3 • r) , 

a 0,T 

which is as a quadratic function in A with minimum 

2 + 8 • r -4 • (r - 2.11803)(r + 0.118034) \ 



(o(r),6(r)) = 



5 + 10-r 10 + 20-r 



where < a(r) < 1 and b(r) > for r < 2.11803 w 2.12. Thus, we obtain for r = cro.p/ CT o,T < 2 - 12 
that .g(A, r) > 0, which implies crf. 2 i > °"i-a-i-A> f° r an y < A < 1. 

□ 

Theorem 4: If 9r = 9rp an & °~o p/°~ot < 2.73 t/ien the allocation 1 : A : (1 — A) results in a 
smaller variance ctq (3.7) (and hence larger asymptotic power) than the balanced allocation 1:1:1 
for any < A < 1 . 

Proof. Substituting the allocation 1:1:1 and 1 : A : (1 — A), respectively, and 6^ = 6j! m the 
variance cr 2 , from (3.7) yields 

= ( 3 + 3A2 ) a lr + 3(1 - A) 2 a ,p 

and 

4a.-i-A = 2(1 + A) al T + 2(1 - A) o\ P . 
Thus, we obtain with r := <7q p /(Jq t > 

2 2 

g (A, r) := ~ 2 gl:A:1 - A = (3 + 3 . r ) A 2 + (-2 - 4 • r) A + (1 + r) , 

a 0,T 

which is as a quadratic function in A with minimum 

'2 + 4-r -(r- 1 + Vfy(r - 1 - 



(o(r),6(r)) = 



6 + 6 ■ r ' 3 + 3- r 



where < a{r) < 1 and b(r) > for r < 1 + \/3 w 2.73. Thus, we obtain for r = cr 2 P /a^ T < 2.73 
that g(A,r) > 0, which implies o-\. x . x > c 2 A . 1 _ A , for any < A < 1. 

□ 



A. 5. Poisson example: weighted KL- divergence minimizer 



An analytical solution to the minimization of the KL-divergence K((^ , £, w) for Poisson endpoints 
in Section 5.2.3 can be obtained by evaluating the derivatives of KL-divergence (5.7) w.r.t. A_r and 
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Xp at zero after substituting At = AXr + (1 — A)Ap, which yields 

X RiHo = [A 2 (-l + w T )w T \ , P - A(-l + w t )w t {X ,p - X , T ) + w 2 R ((-l + A)A , P + (2 - A)A 
+ w R ((-l + A)(-l + w T + Aw T )X ,p + (-A + w T + 2Aw T - A 2 w t )X ,t) - S] 
/ (2(w R + A(-l + w T ))(w R + Aw T )) 

Xp.H a = [w r X ,p + A 2 w T ((-l + w R + w T )X .p - w R X . T ) + w R ((-l + wt)A ,pwtA ,t) 

+ A((2 + w R — 3wt + w T + w R {— 3 + 2wt))X ,p + (wp — w|j + wp — w t )A ,t) - S] 
/ (2((-l + w R )w R + A 2 (-l + w T )w T + A(l - w T + w R {-l + 2w T )))) 

^t,h = AAp^o + (1 — A)X PHo 

with 

S = {-4A(-l + iy K + w T )((-l + wp)wp + A 2 (-l + w T )w T + A(l-w T + wp(-l + 2w T )))A 
((—1 + w fl + u> T )A ,p — (tUfl + ^t)Ao,t) + (A 2 wt(( — 1 + u>p + w T )X Q ^ P - w fl A j) 
+ ior((— 1 + iDfi + w T )X 0t p - wtXq^t) + A((2 - 3wp + w 2 R — 3w T + 2w R w T + w T )X Q , P 

1 /2 

+ (w r -w 2 r + wt-w^)X . t )) 2 } 
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